Skip to content

feat(arrow-avro): accept default value of null for Avro union with null type in any branch position [avro 1.12]#9487

Open
mzabaluev wants to merge 5 commits intoapache:mainfrom
mzabaluev:default-null-for-union-with-null-second
Open

feat(arrow-avro): accept default value of null for Avro union with null type in any branch position [avro 1.12]#9487
mzabaluev wants to merge 5 commits intoapache:mainfrom
mzabaluev:default-null-for-union-with-null-second

Conversation

@mzabaluev
Copy link
Copy Markdown
Contributor

@mzabaluev mzabaluev commented Feb 27, 2026

Which issue does this PR close?

Rationale for this change

The Avro specification version 1.12 extends acceptance of default values for unions to match any schema branch in the union rather than the first.

This change implements the new behavior in the specific case of the default value being null, which is important for some real-world cases of Iceberg schema evolution. Spark converts nullable fields in its SQL schema to Avro field types with the null variant listed last. When a column is added to an iceberg table backed by Avro files, the default value of its field in the reader schema shall be specified as null.

What changes are included in this PR?

Introduce the "avro_1_12" feature as requested by #8703.
When this feature is enabled, change the validation of a null default value for union and nullable types to allow null in any branch (for unions treated as Arrow unions) and nullability order (for unions treated as nullable types).

Are these changes tested?

Added a test gated by the newly introduced "avro_1_12" feature to exercise the ["int", "null"] type with the default of null.

Are there any user-facing changes?

This is a behavioral change where more schema resolution cases become accepted than were permitted by the Avro 1.11 spec. Despite the feature-gating, there may be unexpected effects as explained in #8703 (comment)

Test the Avro 1.12 spec behavior of resolving default values
in the specific case when the default value for the field added in
the reader schema is null, and null the second branch in the field's
union type.
@github-actions github-actions bot added arrow Changes to the arrow crate arrow-avro arrow-avro crate labels Feb 27, 2026
@mzabaluev

This comment was marked as outdated.

Introduce the "avro_1_12" feature flag and use it to guard the
behavior of JSON null defaults for union types having null schema in a
position other than the first.
@mzabaluev mzabaluev marked this pull request as ready for review April 7, 2026 20:14
@mzabaluev mzabaluev changed the title feat(arrow-avro): accept default value of null for Avro union with null type in any branch position feat(arrow-avro): accept default value of null for Avro union with null type in any branch position [avro 1.12] Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate arrow-avro arrow-avro crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant